[TS] HW01_Answer

Time Series
Author

김보람

Published

October 20, 2023

해당 자료는 전북대학교 이영미 교수님 2023고급시계열분석 자료임

library(lubridate)
library(ggplot2)
library(car)
#library(forecast)
library(lmtest)

Attaching package: ‘lubridate’


The following objects are masked from ‘package:base’:

    date, intersect, setdiff, union


Loading required package: carData

Loading required package: zoo


Attaching package: ‘zoo’


The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

1

z <- c(52,46,46,52,50,50,48,45,41,53)
n <- length(z)

(1)

sum(z)
(bar_z = mean(z))
483
48.3

(3)

sum((z- mean(z))^2)
s2 = sum((z- mean(z))^2)/(n-1)
s2
130.1
14.4555555555556
bar_z + qt(0.975,(n-1)) * sqrt((1+1/n)*s2)
bar_z - qt(0.975,(n-1)) * sqrt((1+1/n)*s2)
57.3206225347423
39.2793774652577

- lm함수 비교

m <- lm(z~1)
summary(m)

Call:
lm(formula = z ~ 1)

Residuals:
   Min     1Q Median     3Q    Max 
  -7.3   -2.3    0.7    3.2    4.7 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   48.300      1.202   40.17 1.83e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.802 on 9 degrees of freedom
predict(m, newdata=data.frame(t=11:15), interval="prediction")
A matrix: 5 × 3 of type dbl
fit lwr upr
1 48.3 39.27938 57.32062
2 48.3 39.27938 57.32062
3 48.3 39.27938 57.32062
4 48.3 39.27938 57.32062
5 48.3 39.27938 57.32062

2

(3)

z <- c(303,298,303,314,303,314,310,324,317,327,323,324,331,330,332)
n <- length(z)
t <- 1:n
hat_beta_0 = 2*(2*n+1)/n/(n-1) * sum(z) - 6/n/(n-1)*sum(t*z)
hat_beta_0
hat_beta_1 = 12/n/(n^2-1)*sum(t*z) - 6/n/(n-1)*sum(z)
hat_beta_1
297.780952380952
2.3857142857143

(4)

\(\hat Z_n(l) = \hat \beta_0 + \hat \beta_1(n+l)\)

hat_beta_0 + hat_beta_1 *(15+(1:5))
  1. 335.952380952381
  2. 338.338095238095
  3. 340.72380952381
  4. 343.109523809524
  5. 345.495238095238

- lm비교

m <- lm(z~t)
summary(m)

Call:
lm(formula = z ~ t)

Residuals:
   Min     1Q Median     3Q    Max 
-6.710 -2.331 -1.181  2.519  7.133 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  297.781      2.364 125.964  < 2e-16 ***
t              2.386      0.260   9.176 4.84e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.351 on 13 degrees of freedom
Multiple R-squared:  0.8662,    Adjusted R-squared:  0.856 
F-statistic: 84.19 on 1 and 13 DF,  p-value: 4.836e-07
predict(m, newdata = data.frame(t=15+(1:5)), interval="prediction")
A matrix: 5 × 3 of type dbl
fit lwr upr
1 335.9524 325.2553 346.6495
2 338.3381 327.3932 349.2830
3 340.7238 329.5084 351.9393
4 343.1095 331.6025 354.6166
5 345.4952 333.6771 357.3134

3

note: 모의시계열 표본평균과 표본분산을 안구했었네!;

t <- 1:100
####### (1)
Zt_1 <- 100 + rnorm(100,0,1)
####### (2)
Zt_2 <- 500 + rnorm(100,0,1)
####### (3)
Zt_3 <- 100 + rnorm(100,0,10)
####### (4)
Zt_4 <- 100 + t * rnorm(100,0,1)
dt <- data.frame(
 Zt_1 = Zt_1,
 Zt_2 = Zt_2,
 Zt_3 = Zt_3,
 Zt_4 = Zt_4)
round(colMeans(dt),2)
round(apply(dt,2,var),2)
Zt_1
100.04
Zt_2
499.98
Zt_3
100.45
Zt_4
95.8
Zt_1
1.05
Zt_2
0.84
Zt_3
89.68
Zt_4
2449.71
ts.plot(ts(Zt_1), ts(Zt_2), ts(Zt_3), ts(Zt_4),
 lwd=2,
 col= 1:4)
legend(c(0,450), legend = c(expression(Z[t1]), expression(Z[t2]), expression(Z[t3]),expression(Z[t4]),
         col= 1:4, bty='n', lty=1, lwd=2)
ERROR: Error in parse(text = x, srcfile = src): <text>:6:0: unexpected end of input
4: legend(c(0,450), legend = c(expression(Z[t1]), expression(Z[t2]), expression(Z[t3]),expression(Z[t4]),
5:          col= 1:4, bty='n', lty=1, lwd=2)
  ^

4

###### (1)
Zt_1 <- 100 + rnorm(100,0,1)
####### (2)
Zt_2 <- 100 + t + rnorm(100,0,1)
####### (3)
Zt_3 <- 100 + 0.3*t + sin(2*pi*t/12) + cos(2*pi*t/12) + rnorm(100,0,1)
Warning message in 100 + t + rnorm(100, 0, 1):
“longer object length is not a multiple of shorter object length”
Warning message in 100 + 0.3 * t + sin(2 * pi * t/12) + cos(2 * pi * t/12) + rnorm(100, :
“longer object length is not a multiple of shorter object length”
ts.plot(ts(Zt_1), ts(Zt_2), ts(Zt_3),col= 1:3, lwd=2)
legend("topleft", legend = c(expression(Z[t1]), expression(Z[t2]), expression(Z[
 col= 1:3, lty=1, lwd=2)
  • Zt_1 : 불규칙성분

  • Zt_2 : 추세성분 + 불규칙 성분

  • Zt_3 : 추세성분 + 계절성분(주기12) + 불규칙 성분: 계절성분 값이 오차에 비해 너무 작아 확인하기 쉽지 않다.

Zt_3 <- 100 + 0.3*t + 3*sin(2*pi*t/12) + 20*cos(2*pi*t/12) + rnorm(100,0,1)
ts.plot(ts(Zt_1), ts(Zt_2), ts(Zt_3),col= 1:3)
legend("topleft", legend = c(expression(Z[t1]), expression(Z[t2]), expression(Z[
 col= 1:3, lty=1)

5

z <- scan("book.txt")
plot.ts(z)

(3)

\(Z_t = \beta_0 + \beta_1 t + \epsilon_t\)

length(z)
30
t <- 1:length(z)
m <- lm(z~t)
summary(m)

Call:
lm(formula = z ~ t)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.5563 -1.0063 -0.2081  1.0385  2.0644 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  8.19080    0.46960   17.44   <2e-16 ***
t            3.07586    0.02645  116.28   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.254 on 28 degrees of freedom
Multiple R-squared:  0.9979,    Adjusted R-squared:  0.9979 
F-statistic: 1.352e+04 on 1 and 28 DF,  p-value: < 2.2e-16

(4)

\(\hat Z_{n+l} = \hat Z_n(l) = \hat \beta_0 + \hat \beta_1(n+l)\)

new_dt <- data.frame(t=30+(1:12))
predict(m, new_dt)
1
103.542528735632
2
106.618390804598
3
109.694252873563
4
112.770114942529
5
115.845977011494
6
118.92183908046
7
121.997701149425
8
125.073563218391
9
128.149425287356
10
131.225287356322
11
134.301149425287
12
137.377011494253
pred <- predict(m, new_dt, interval="prediction")
pred
A matrix: 12 × 3 of type dbl
fit lwr upr
1 103.5425 100.7996 106.2855
2 106.6184 103.8584 109.3784
3 109.6943 106.9162 112.4723
4 112.7701 109.9731 115.5671
5 115.8460 113.0291 118.6629
6 118.9218 116.0842 121.7595
7 121.9977 119.1384 124.8570
8 125.0736 122.1918 127.9554
9 128.1494 125.2443 131.0546
10 131.2253 128.2960 134.1546
11 134.3011 131.3469 137.2554
12 137.3770 134.3970 140.3570
plot(t, z, type='n', ylim = c(0, 145), xlim=c(1,(30+12)))
lines(t, z, lwd=2)
lines(30+(1:12), pred[,1], col=2, lwd=3)
lines(30+(1:12), pred[,2], col=3, lwd=3, lty=2)
lines(30+(1:12), pred[,3], col=3, lwd=3, lty=2)

6

z <- scan("export.txt")
n <- length(z)
n
86
z_ts <- ts(z, frequency=12)
cycle(z_ts)
A Time Series: 8 × 12
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 1 2 3 4 5 6 7 8 9 10 11 12
2 1 2 3 4 5 6 7 8 9 10 11 12
3 1 2 3 4 5 6 7 8 9 10 11 12
4 1 2 3 4 5 6 7 8 9 10 11 12
5 1 2 3 4 5 6 7 8 9 10 11 12
6 1 2 3 4 5 6 7 8 9 10 11 12
7 1 2 3 4 5 6 7 8 9 10 11 12
8 1 2
plot.ts(z)

plot.ts(log(z))

추세성분, 계절성분(주기=12), 불규칙 성분으로 구성되어 있음

계절추세모형

\(Z_t = \beta_0+ \beta_1t + \sum_{i=1}^12 \delta_i \times I_{t1} + \epsilon_t\), 단 \(\delta_1=0\)

note: 나는 로그변환 한 값을 넣었음

z_ts <- ts(z, frequency=12)
cycle(z_ts)
A Time Series: 8 × 12
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 1 2 3 4 5 6 7 8 9 10 11 12
2 1 2 3 4 5 6 7 8 9 10 11 12
3 1 2 3 4 5 6 7 8 9 10 11 12
4 1 2 3 4 5 6 7 8 9 10 11 12
5 1 2 3 4 5 6 7 8 9 10 11 12
6 1 2 3 4 5 6 7 8 9 10 11 12
7 1 2 3 4 5 6 7 8 9 10 11 12
8 1 2
seasonal_I <- as.factor(cycle(z_ts))
t <- 1:n
m2 <- lm(z~t+seasonal_I)
summary(m2)

Call:
lm(formula = z ~ t + seasonal_I)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.8562  -2.2938   0.1567   2.6730   9.3951 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  21.98000    1.73500  12.669  < 2e-16 ***
t             0.43721    0.01893  23.097  < 2e-16 ***
seasonal_I2   1.71779    2.16697   0.793 0.430512    
seasonal_I3   9.21741    2.24422   4.107 0.000103 ***
seasonal_I4   7.37163    2.24366   3.286 0.001566 ** 
seasonal_I5   9.30299    2.24326   4.147 8.98e-05 ***
seasonal_I6  12.96578    2.24302   5.780 1.72e-07 ***
seasonal_I7   9.16286    2.24294   4.085 0.000112 ***
seasonal_I8   7.73422    2.24302   3.448 0.000941 ***
seasonal_I9  11.07272    2.24326   4.936 4.88e-06 ***
seasonal_I10 10.68409    2.24366   4.762 9.47e-06 ***
seasonal_I11 12.37545    2.24422   5.514 5.03e-07 ***
seasonal_I12 18.57967    2.24494   8.276 4.26e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.334 on 73 degrees of freedom
Multiple R-squared:  0.9011,    Adjusted R-squared:  0.8848 
F-statistic:  55.4 on 12 and 73 DF,  p-value: < 2.2e-16
  • 적합된 모형: \(\hat Z_t = 21.98 + 0.437t + 1.72 I_{t2} + 9.22 I_{t3} + \dots 18.58 I_{t12}\)

(4) 모형설명(제약조건 주의!)

  • 1개월이 지날 때마다 월별수출액은 평균적으로 0.437억$증가

  • 2월은 1월에 비해 월별수출액이 1.72억$ 증가

  • 3월은 1월에 비해 월별수출액이 9.22억$ 증가

  • 12월은 1월에 비해 평균 월별수출액이 18.58억$ 증가

(5)

\(\hat Z_{n+l} = \hat Z_n(l) = \hat \beta_0 + \hat \beta_1(n+l) + \sum_{i=1}^{12} \delta_i I_{t1}\)

new_dt <- data.frame( t = n + (1:12),
 seasonal_I = as.factor(c(3:12,1,2)))
new_dt
A data.frame: 12 × 2
t seasonal_I
<int> <fct>
87 3
88 4
89 5
90 6
91 7
92 8
93 9
94 10
95 11
96 12
97 1
98 2

(6) 95% 예측 구간

predict(m2, new_dt)
1
69.2346153846154
2
67.8260439560439
3
70.1946153846154
4
74.2946153846154
5
70.9289010989011
6
69.9374725274725
7
73.7131868131868
8
73.7617582417582
9
75.8903296703297
10
82.5317582417582
11
64.3892994505495
12
66.5442994505494
pred <- predict(m2, new_dt, interval = 'prediction')
pred
A matrix: 12 × 3 of type dbl
fit lwr upr
1 69.23462 59.82517 78.64407
2 67.82604 58.41659 77.23549
3 70.19462 60.78517 79.60407
4 74.29462 64.88517 83.70407
5 70.92890 61.51945 80.33835
6 69.93747 60.52802 79.34692
7 73.71319 64.30374 83.12264
8 73.76176 64.35231 83.17121
9 75.89033 66.48088 85.29978
10 82.53176 73.12231 91.94121
11 64.38930 55.00439 73.77421
12 66.54430 57.15939 75.92921

69.23462 = \(\hat Z_{87} = \hat Z_{86}(1) = \hat \beta_0 + \hat \beta_1 \times 87 + \delta_3 \times I_{87 \times 3}\) (3월)

70.92890 = $ Z_{86}(5) = _0 + _1 (86+5) + 7 I{91 }$ (7월)

(7)

min_ <- min(c(z, pred[,2]))
max_ <- max(c(z, pred[,3]))
plot(t, z, type='n', ylim = c(min_, max_), xlim=c(1,(n+12)))
lines(t, z, lwd=2)
lines(n+(1:12), pred[,1], col=2, lwd=3)
lines(n+(1:12), pred[,2], col=3, lwd=3, lty=2)
lines(n+(1:12), pred[,3], col=3, lwd=3, lty=2)

(번외) 잔차검정

resid_m <- resid(m2)
plot(resid_m, pch=16, type='l')
abline(h=0, lty=2)

t.test(resid_m)

    One Sample t-test

data:  resid_m
t = 2.6111e-16, df = 85, p-value = 1
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.861081  0.861081
sample estimates:
   mean of x 
1.130835e-16 
hist(resid_m)

lmtest::bptest(m2)

    studentized Breusch-Pagan test

data:  m2
BP = 18.466, df = 12, p-value = 0.1023
lmtest::dwtest(m2, alternative="two.sided")

    Durbin-Watson test

data:  m2
DW = 0.79196, p-value = 5.014e-09
alternative hypothesis: true autocorrelation is not 0

다항추세 사용

m3 <- lm(z ~ t + I(t^2) + seasonal_I)
summary(m3)

Call:
lm(formula = z ~ t + I(t^2) + seasonal_I)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.2252 -2.1156  0.0414  2.2895  7.5664 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)  17.3011107  1.6527443  10.468 4.12e-16 ***
t             0.7955399  0.0637685  12.475  < 2e-16 ***
I(t^2)       -0.0041187  0.0007103  -5.799 1.65e-07 ***
seasonal_I2   1.7177908  1.8014987   0.954 0.343510    
seasonal_I3   8.5584096  1.8691776   4.579 1.91e-05 ***
seasonal_I4   6.6796790  1.8690680   3.574 0.000633 ***
seasonal_I5   8.5863287  1.8690137   4.594 1.81e-05 ***
seasonal_I6  12.2326445  1.8690050   6.545 7.54e-09 ***
seasonal_I7   8.4214835  1.8690354   4.506 2.50e-05 ***
seasonal_I8   6.9928457  1.8691016   3.741 0.000365 ***
seasonal_I9  10.3395882  1.8692037   5.532 4.84e-07 ***
seasonal_I10  9.9674253  1.8693449   5.332 1.07e-06 ***
seasonal_I11 11.6835000  1.8695317   6.249 2.59e-08 ***
seasonal_I12 17.9206692  1.8697737   9.584 1.72e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.603 on 72 degrees of freedom
Multiple R-squared:  0.9326,    Adjusted R-squared:  0.9204 
F-statistic: 76.58 on 13 and 72 DF,  p-value: < 2.2e-16
summary(m2)

Call:
lm(formula = z ~ t + seasonal_I)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.8562  -2.2938   0.1567   2.6730   9.3951 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  21.98000    1.73500  12.669  < 2e-16 ***
t             0.43721    0.01893  23.097  < 2e-16 ***
seasonal_I2   1.71779    2.16697   0.793 0.430512    
seasonal_I3   9.21741    2.24422   4.107 0.000103 ***
seasonal_I4   7.37163    2.24366   3.286 0.001566 ** 
seasonal_I5   9.30299    2.24326   4.147 8.98e-05 ***
seasonal_I6  12.96578    2.24302   5.780 1.72e-07 ***
seasonal_I7   9.16286    2.24294   4.085 0.000112 ***
seasonal_I8   7.73422    2.24302   3.448 0.000941 ***
seasonal_I9  11.07272    2.24326   4.936 4.88e-06 ***
seasonal_I10 10.68409    2.24366   4.762 9.47e-06 ***
seasonal_I11 12.37545    2.24422   5.514 5.03e-07 ***
seasonal_I12 18.57967    2.24494   8.276 4.26e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.334 on 73 degrees of freedom
Multiple R-squared:  0.9011,    Adjusted R-squared:  0.8848 
F-statistic:  55.4 on 12 and 73 DF,  p-value: < 2.2e-16
  • 2차 추세도 유의하고, 수정된 결정계수의 값이 증가 하였기 때문에 2차 추세 모형 사용 가능
resid_m3 <- resid(m3)
plot.ts(resid_m3)
abline(h=0, lty=2)

t.test(resid_m3)

    One Sample t-test

data:  resid_m3
t = -4.2693e-16, df = 85, p-value = 1
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.7109349  0.7109349
sample estimates:
    mean of x 
-1.526557e-16 
lmtest::bptest(m3)

    studentized Breusch-Pagan test

data:  m3
BP = 11.389, df = 13, p-value = 0.5783
lmtest::dwtest(m3)

    Durbin-Watson test

data:  m3
DW = 1.1633, p-value = 4.187e-05
alternative hypothesis: true autocorrelation is greater than 0